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Abstract 

Accurate density estimation methodologies play an integral role in a variety of scien- 
tific disciplines, with applications including simulation models, decision support tools, 
and exploratory data analysis. In the past, histograms and kernel density estimators 
have been the predominant tools of choice, primarily due to their ease of use and 
mathematical simplicity. More recently, the use of wavelets for density estimation 
has gained in popularity due to their ability to approximate a large class of functions, 
including those with localized, abrupt variations. However, a well-known attribute 
of wavelet bases is that they can not be simultaneously symmetric, orthogonal, and 
compactly supported. Multiwavelets — a more general, vector-valued, construction of 
wavelets — overcome this disadvantage, making them natural choices for estimating 
density functions, many of which exhibit local symmetries around features such as a 
mode. We extend the methodology of wavelet density estimation to use multiwavelet 
bases and illustrate several empirical results where multiwavelet estimators outperform 
their wavelet counterparts at coarser resolution levels. 

Keywords: Orthogonal series density estimation, non-parametric density estimation, 
wavelets, multiwavelets 



1. Introduction 

Many applications require estimating the underlying probability density function 
(PDF) of a finite sample making minimal or no assumptions about the generating func- 
tion. Having an accurate model of the underlying PDF enables one to understand the 
structure of the data and carry out more advanced statistical analysis such as classi- 
fication, confidence modeling, and clustering. Here we introduce for the first time a 
new class of density estimators based on a series expansion of multiwavelets ifl \vk . 
Multiwavelets can be created to retain all of the desirable properties of regular wavelets 
and, in addition, exhibit very desirable properties which wavelets do not: simultaneous 
symmetry, compact support, and orthogonality. These properties make multiwavelet 
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density estimation (MWDE) well-suited for reconstructing a wide class of density func- 
tions, particularly those that exhibit local or global symmetries. 

The primary contribution of our work is to introduce for the first time the use of mul- 
tiwavelets for density estimation. We also empirically compare MWDE performance 
to regular wavelet density estimation (WDE). Our focus is not to pit multiwavelets 
versus wavelets for the task of density estimation. We are still in the early stages 
of our research and continue to explore pros and cons associated with multiwavelets 
used as bases for density estimation. For example, the methodology we present here is 
strictly a linear multiwavelet estimator; hence, we do not discuss issues of thresholding 
the multiwavelet bases. Implementing an effective thresholding technique will yield 
sparser representations and should make MWDE more directly comparable to WDE. 
The richer mathematical properties afforded by multiwavelets demand we investigate 
their use for important applications such as density estimation. 



1.1. Relevant Work 

There are many well-studied density estimation techniques which we can loosely 
categorized into the taxonomy of parametric, semi-parametric, and nonparametric es- 
timators. Of these, nonparametric models are the most data-driven, requiring little to 
no assumptions about the underlying generative model of the data. These models in- 
clude the ubiquitous histogram described by Silverman ll35ll and the well-established 



kernel density estimators J32| 
later described by Watson 1145 



Though introduced in the 1960s by Cencov [6] and 
and improved by Anderson and de Figueiredo H, or- 
thogonal series estimation (OSE) lagged in popularity due to lack of suitable bases 
for the series expansion. Despite this lag, work was done on OSE by Hall [193 and 
Ahmad [ 1] to investigate the convergence rate and integrated mean square error prop- 
erties, respectively. Until about 25 years ago, the series expansions used for OSE were 
essentially limited to Fourier bases (i.e. sines and cosines) 12211 or orthogonal poly- 
nomials, e.g. Hermite 113 ill and Laguerre I120ll . The main downfall of these bases is 
their infinite support, demanding a large number of terms in the series expansion to 
accurately approximate complex densities containing multiple modes and abrupt vari- 
ations. With the advent and growing use of wavelets, we are now seeing more uses of 
OSE 1 15, 3(i SIH. In fact, Peter and Rangarajan ll28ll show wavelet density estimators 
(WDE) often outperform many other nonparametric density estimators. The main ad- 
vantage comes from the fact wavelets can be constructed with the convenient property 
of compact support ]9J], allowing them to easily and accurately represent functions with 
discontinuities and other abrupt local phenomena. In addition, WDE can be extended 
to non-linear estimators through the use of wavelet function coefficient thresholding in- 
troduced by Donoho et al. [11], which allows us to represent the density with a sparse 
set of coefficients while retaining accuracy. 

Unfortunately, wavelets can not be simultaneously compact, symmetric, and or- 
thogonal [9]. Many common analytic densities exhibit some form of symmetry either 
globally (e.g. Gaussian and Laplacian) or locally about their modes. The immedi- 
ate consequence of wavelets lacking symmetry (if they also want to be orthogonal and 
compact) is that more of them will be required, either via a multiresolution expansion or 
a very fine single level expansion, to reproduce the symmetries in the underlying densi- 
ties. Doing away with compactness and orthogonality allows us to have wavelet bases 
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that are symmetric, but these relinquished features are the very properties that made 
wavelets an attractive choice over trigonometric or polynomial bases. This leaves us 
with only one choice of bases satisfying all these desirable properties: multiwavelets. 
Multiwavelets are a more general, vector-valued construction of wavelets 



1711 . When used in a series expansion, they utilize multiple basis functions at ev- 



ery translate and at each resolution level. Like wavelets, multiwavelets can be con- 
structed to have compact support, allowing them to faithfully model local disconti- 
nuities. Furthermore, they can be orthogonal, making coefficient estimation simple. 
Unlike wavelets, they can be symmetric while maintaining their compactness and or- 
thogonality, allowing them to model local and global symmetries at coarser resolutions 
than wavelets. In this paper, we extend the concept of WDE to multiwavelet density 
estimation and demonstrate the utility of multiwavelets to model a variety of distribu- 
tions. To our knowledge, this is the first use of multiwavelets for density estimation. 

The remainder of this paper is organized as follows. The next section provides 
background knowledge of wavelets, WDE, and multiwavelets. In |3] we show how to 
construct a density estimator with multiwavelets. In §|4] we demonstrate the capabilities 
of MWDE using a wide variety of multiwavelet families and compare MWDE to the 
conventional WDE. Finally, we conclude with some observations and directions for 
future research. 



2. Review of Wavelets and Multiwavelets 

2.1. Standard Wavelets and Multiresolution Analysis 

Wavelets are refinable functions whose values are solutions to the dilation equation 

(x) = \fm ^ likty (mx — k) , (1) 

JfceZ 

where (x) is called the father wavelet (a.k.a. scaling function), m 6 Z is the dilation 
factor (in our case, and in most practical cases, m = 2), £ M are low-pass coefficients 
(the "filter") defining the wavelet, and k are integer translates of the father wavelet 
across the domain. The dilation factor is further controlled by a choice of resolution 
level, j E Z, that expands or contracts the wavelet. Hence we will typically denote the 
normalized basis function at resolution j and translate k as 0; k = 2 7//2 0(2 ; x — k). At 
a chosen resolution level j, the father wavelet and its integer translates form a basis 
for the function space Vj which is a subspace of the space of all square integrable 
functions L 2 (R) . These father wavelets capture the "smooth" or "averaging" properties 
of functions. Correspondingly, one can construct a set of mother wavelets which model 
the details (i.e. oscillating properties) of functions. These form a basis for the space W j, 
which consists of functions orthogonal to Vj. The mother wavelets y/ (x) (a.k.a. wavelet 
functions) are the members of W and are found using the high-pass filter coefficients 
gk and by solving the wavelet, two-scale equation 

Yr(jc) = V2£ ft *(2x-*). (2) 

keZ 

Eqs. [TJand|2]only provide values for <j> and y/ evaluated on the dyadic rationals. If we 
want to evaluate <p (x) at any ieR, then we simply interpolate using known values of 
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on the dyadic rationals. When this arises in the context of density estimation, we use 
a cubic spline interpolant. 

There are several families of wavelets — Daubechies, Coiflets, Symlets, just to name 
a few — all of which are constructed from and defined by unique sets of coefficients. 
Wavelets can be constructed to have the convenient properties of orthogonality across 
their integer translates and compact support; however, these properties can not be had 
simultaneously with perfect symmetry H. As discussed in $2.31 this drawback can be 
addressed through the use of multiwavelets. 

Scaling and wavelet bases can be brought together to represent functions in a mul- 
tiresolution expansion. Given a function / 6 L 2 (K), a multiresolution analysis (MRA) 
of that function yields 

/(*) = £ a,o W + £ E Pj,kYj,k M . (3) 

kez J=hkeZ 

where (Xj k S K are the father wavelet coefficients, and j3 ; - ± E R are the mother wavelet 
coefficients. The lowest resolution level used for function approximation is jo E Z, and 
all other resolutions are j E Z subject to j > jo- In reality, though, only a finite set 
of resolutions will be utilized such that jo < j < J < <*>. There are several approaches 
that can be applied for choosing /'o and J; in the context of density estimation, we refer 
interested readers to Vidakovic [41]. The MRA inL 2 (M) is a doubly-infinite sequence 
of nested subspaces V) 6 z 

•••CV_ 2 CV_iCVbCViCV 2 C--- (4) 

such that |~| j Vj = {0} and |J ; ' Vj = L 2 (M), allowing Vj to be used as a basis. The scaling 
and wavelet bases relationship is such that Wj is orthogonal to Vj, and at any resolution 
level j we have 

Vj = Vj- l ®W j - 1 , (5) 

justifying the expansion in Eq. |3] 

2.2. Wavelet Density Estimation 

Wavelet density estimation is a specific incarnation of orthogonal series density 
estimation (OSE), where one expands the density function p(x) in a multiresolution 
wavelet basis: 

P (x) = £ <*h*$h* (*) + £ E PhkYhk ( x ) ■ (6) 

kez i=iokei 

The objective now becomes determining the coefficients a, # and fij^ from a given 
independent and identically distributed (i.i.d.) sample X = {Xi} i=1 , The standard 
approach — though there are others I128ll — is to simply project the density function onto 
the basis expansion: 

Ujo,k = (p, <l>jo,k) = P{x) <l>j ,k (x) dx. (7) 
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This allows us to interpret the inner product of Eq. |7]as an expectation to find 

a h,k = J P(x) §h-k ( x ) dx = £ [(j)j Qik (x)] , (8) 

where S [■] is the expectation operator. Assuming a finite sample, the expectation is 
approximated by the sample mean. Hence, the coefficients CKy-t are estimated by 

1 N 

& jo,* = 77 £0jo,* ( 9 ) 
and similarly for the wavelet function coefficients J3; # as 

Pj*=itn*( x i), do) 
jv i=i 

to arrive at the approximation p (x) given by 

j 

p (x) = £ a io j 0; o ,* (jf) + £ £ j8 i)A Wj,k(x). (11) 

<teZ ;=;'o iteZ 

There are no guarantees, however, that the resulting density estimate p will satisfy 
the necessary properties of a density function (namely, J p = 1 and p > 0). So, once the 
density has been estimated, a post-processing normalization procedure |l4j] is typically 
performed to achieve these properties. It is worth noting there are ways to guarantee 



the resulting p is positive, such as using positive wavelets as in Walter and Shen [43] 



and even ways to ensure the resulting density will be positive and integrate to one, 
namely by estimating not p, but ( y/p) 2 = p with certain restrictions on the coefficients 
IUEII28]. 



The utility of representing a density function using an MRA stems from the ability 
to threshold detail coefficients and gain an economical (in the sense of sparse coeffi- 
cients), yet accurate estimator. These coefficients can be set to zero via "hard thresh- 
olding" techniques. Similarly, the larger mother wavelet function coefficients can be 
shrunk toward zero to reduce their contribution to the reconstruction, making the re- 
sulting estimate smoother; this is generally called "soft thresholding." In Donoho et al. 
Hill and Donoho and Johnstone [10], thresholding and its implications on convergence 
are analyzed to provably show WDE to be optimal under mini-max criteria. When 
coefficient thresholding is employed, WDE is a class of nonlinear density estimators. 
Our present focus is to introduce the use of multiwavelets in density estimation. To this 
end, we do not address the issue of thresholding coefficients when using multiwavelets 
as the bases of our density estimator. 

2.3. Multiwavelets 

Multiwavelets are a more general, vector-valued constructions of wavelets. Excel- 
lent foundations of multi wavelet theory are given in Strela f38ll and Keinert J2H1 . with 
origins traced back to Alpert fl, Goodman et al. [18|, and Goodman and Lee Il7ll . 
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A multiscaling function (x) is a vector-valued refinable function with multiplicity 
r € Z + of the form 

/ 01 (*) \ 

£(*) = ; , (12) 

and satisfying the refinement equation 

<j) (x) — \fm HfrQ (mx — k) , k,m<E'L, m>2, (13) 
— kez ~ 

where H\ are low-pass rxr matrices called the "recursion coefficients" defining the 
multiscaling function and collectively referred to as the "multifilter," paralleling the 
convention in standard wavelets. As before, we are concerned only with the dyadic 
case: m = r = 2. In fact, all density estimations with multiwavelets presented in this 
paper are performed, for the sake of simplicity, with multiwavelets of multiplicity r = 2. 
Multiscaling functions can be generated using a cascade algorithm of the same form as 
the standard wavelet cascade algorithm, but with matrix coefficients: 

£ W (x) = V2£flt0 (,,_1, (2x-ifc), (14) 

kez 

with an orthogonal <j>(°>, such as the Haar multiscaling function, paralleling the con- 
ventional Haar scaling function. 

Mother multiwavelet functions y/ (x) (in this case, of multiplicity 2) can be created 
using the multiwavelet equation (paralleling Eq. |2]for standard wavelets): 

y(x) = y/lJ^G k ${2x-k), (15) 

kez 

where Gt are high-pass rxr matrices. 

Analogous to wavelets in 32.21 MRAs can be constructed for multiwavelets using 

the same procedure yielding the set \ w j k '■ G z|. This set is a basis for the space 
W. Like with traditional scalar wavelets, the multiscaling functions form a basis for the 
spaces Vt e %, which are orthogonal to Wj e z- As previously stated, we are not presently 
concerned with coefficient thresholding, so all multiwavelet density estimations in this 
paper are made using strictly multiscaling functions. 

As with wavelets, there are many multiwavelet families — Shen-Tan-Tham (STT) 
by Shen et al. B34I1 . Donovan-Geronimo-Hardin-Massopust (DGHM) by Geronimo 
et al. II 1611 and Donovan et al. 11211 (with multiwavelet functions by Strang and Strela 



13711). and Chui-Lan (CL) by Chui and Lian [8], just to name a few. Figs. Q][2l and[3] 
illustrate the multiscaling and multiwavelet pairs for several well-known multiwavelet 
families. Like with standard wavelets, the properties of orthogonality and compact 
support are enforced during the construction of the recursion coefficients Hf, uniquely 
defining a multiwavelet. Unlike wavelets, however, multiwavelets can be simultane- 
ously symmetric and compactly supported while retaining orthogonality across their 
integer translates. This desirable property serves as the primary motivation for density 
estimation with a multiwavelet basis rather than the standard wavelet basis. 
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(c) 



(d) 



Figure 1: The DGHM multiwavelet has symmetric multiscaling functions Donovan 
et al. 11211 and symmetric/antisymmetric multiwavelet functions Strang and Strela 03711 . 
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Figure 2: The STT multiwavelet Shen et al. [34] has symmetric/antisymmetric multi- 
scaling and multiwavelet functions. 
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An interesting incarnation of multiwavelets are the so-called "balanced multiwavelets," 
first explicitly introduced by Lebrun and Vetterli I123ll and later more-rigidly formalized 
in Lebrun and Vetterli Il24ll . Selesnick 113 311 investigated the approximation properties 
of balanced multiwavelets, and later Lebrun and Vetterli 112511 generalized the concept 
of multiwavelet balancing to arbitrarily high order. This class of multiwavelets was 
developed to solve a problem in signal-processing arising from the vector nature of 
multiwavelets, where a one-dimensional input signal must be vectorized before being 
passed through a multifilter. This vectorization can result in undesirable effects on the 
signal reconstruction due to "unbalanced" channels in the lowpass coefficients of the 
multifilter. Through procedures proposed by Lebrun and Vetterli 12511 . a multiwavelet 
can be balanced to eliminate these undesirable features of the lowpass coefficients. In 
fact, standard wavelets, such as the Daubechies, Symlets, and Coiflets, can be used to 
construct balanced multiwavelets. A particular balanced multiwavelet is shown in Fig. 
[3] Here, we have taken the standard Daubechies wavelet of order 2 and, with the toolkit 
from Keinert 12111 . used it to construct a balanced multiwavelet of multiplicity r = 2. 
The resulting balanced multiwavelet is simply a compressed version of the original 
wavelet translated on the half-integers. We have illustrated this explicitly in Fig. |3]by 
showing the standard Daubechies wavelet on the first row and the components of the 
balanced multiwavelet on the remaining rows. We specifically mention balanced mul- 
tiwavelets in this paper as they produce some interesting, though unsurprising, results 
when used as a basis for MWDE. 



3. Multiwavelet Density Estimation 



Our objective is to approximate a density function p(x) using the multiwavelet basis 
in a form analogous to WDE. Again, the input is an i.i.d. sample of one-dimensional 



data X = {Xi} i=v and we aim to construct 



p (x) = £ k (*)+£ £ £j^, k (x) , (16) 

keZ j=kk&L 

where, analogous to the wavelet case, <p (x) — 2 ; / 2 <p (2->x — where <p is <j) or \jf. 
The coefficients 0£ ; o ^ and J3y _ k have become the r-dimensional vectors a JQ k and P 
We can expand Eq. Q~6] into its explicit vector form to see the reconstruction more 
clearly: 



a r,in.k 




(17) 

Evidently, the density function is completely described by the coefficients a_j Q k and 
j3 . , so the objective is to estimate a ;o k and j3 . as a ;o k and j3 . , respectively, us- 
ing only the sample X. Before we detail a projection approach similar to WDE, it is 
worth expanding on the notion of orthonormality as it applies to multiwavelets to make 
explicitly clear the idea of multiwavelet density estimation in an OSE environment. 
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X X 

(e) (f) 

Figure 3: The standard Daubechies wavelet of order 2 is on the first row to illustrate that 
the balanced Daubechies multiwavelet (last two rows) is just compressed and translated 
versions of the standard Daubechies wavelet. 
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Multiwavelets are orthonormal across integer translates k if 

<j>(x),<j>(x-ky) = / <p{x)<p*{x-k)dx = 5o k I, 



(18) 



where 5,-y is the Kronecker delta function, / is the rxr identity matrix, and 0* denotes 
the conjugate transpose of the vector 0. Note, that since G W for our purposes, 
(j>* = <f> T . So, expanding Eq. [18] we find that 



01 (x) 



01 (x - k) 



§ r (x — k) )dx = 5o/t 



<j) r (x) 
which implies 

/ (j>i (x) 0i (x — k) dx 

J (j> r (x) 0i (x — k) dx 
Finally, it is evident that 



/ 1 







(19) 



/ (pi (x) <p r (x — k) dx 

J (j) r (x) <p r (x — k) dx 

(pi (x) <l>j{x- k) dx = So k Sij 



= <>0k 



(20) 



(21) 



implying 0, (x) is orthonormal to ; - (x) when i ^ j across integer translates k; that is, 
can be constructed such that its elements are orthonormal across integer translates, 
justifying Eq. [16] The same conclusion holds for the mother multiwavelet functions. 
Therefore, we can calculate the coefficients a ,■ and B . , using the standard inner 
product projection: 



—jo-.k 



(22) 



and similarly for B fe : 



l hk = {p^ = j r p(x)Y jtk (x)dx. 
As before, this allows us to interpret the inner product in Eq. [22] as an expectation 



a 



which is approximated as the sample mean 



(23) 



(24) 
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The multiwavelet function coefficients . k are estimated as expected: 

h*=^%^- (25) 



The final approximation p (x) is given by 



p to = I £h*± k * to + L E Pj k Vj k to • ( 2 6) 

keZ i=hk&L 

This is the linear multiwavelet density estimator. As in the wavelet case, nonlinear 
MWDE is also possible using either hard or soft thresholding of the multiwavelet co- 



efficients p ... For instance, see the work already done by IIO, mH 13911 on multiwavelet 
coefficient thresholding in signals-processing applications. In addition to element-wise 
thresholding, Bacchelli and Papi [4] investigated vector-wise thresholding, taking ad- 
vantage of the fact that multiwavelet coefficients can be correlated in their vector rep- 
resentations. As in WDE, the resolution levels jo < j < J < °° are chosen by the user 
or by using model selection techniques as described, for instance, by Vidakovic ll42ll . 
We do not rigorously address the choice of resolution levels or coefficient thresholding 
here, as our objective is only to show that OSE can be successfully accomplished with 
multiwavelet bases. 



4. Experimental Results 

We detail several experiments using MWDE in comparison with standard WDE. 
We examine how multiwavelet symmetry can affect the reconstruction of densities with 
global and/or local symmetrical properties. Finally, we investigate the interesting case 
of using balanced multiwavelets. In addition to simple densities like the Gaussian, we 



extensively use the complicated density functions covered in Marron and Wand H26H 
and Wand and Jones [44]. These functions are, in many cases, multi-modal and con- 
tain local symmetries. We constructed a density estimator using various multiwavelet 
bases and measured the accuracy of the estimator under the integrated square error 
(ISE) between the estimated density p and the actual density p. For the illustrated test 
cases, we use a variety of multiwavelet families chosen by their regularity of appear- 
ance in the literature. The multiwavelets were computationally constructed using the 
iterative cascading algorithm H36H . This is a standard procedure used even for wavelet 
constructions, with implementation details specific to multiwavelets available in Kein- 
ertlElll. 



4.1. Multiwavelet Symmetry 

We begin by attempting to empirically motivate the advantages of MWDE versus 
WDE. Theoretically this was based on the fact that multiwavelets can be constructed 
with simultaneous symmetry, compact support, and orthogonality, whereas wavelets 
can not have these properties simultaneously. We developed a simple test to investi- 
gate the utility of the symmetry property by using multiwavelets and standard wavelets 
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to estimate the symmetric unimodal Gaussian and a bimodal distribution. For this 
comparison, we employed the commonly-used Daubechies wavelet for the WDE. For 
multiwavelets, we have a choice of several symmetric and symmetric/antisymmetric 
families that have already been developed. The DGHM multiwavelet [ 12] has symmet- 
ric multiscaling functions and symmetric/antisymmetric multiwavelet functions rf37ll : 



it is plotted in Fig. Q] The STT multiwavelet [34] has symmetric/antisymmetric mul- 
tiscaling and multiwavelet functions and is plotted in Fig. [2] We use both the DGHM 
and STT multiwavelets to demonstrate the capabilities of symmetric multiwavelets for 
density estimation. We do not employ any MRA for our density estimation, so the mul- 
tiwavelet functions are not of direct interest here. However, such a density estimator 
could be easily constructed following the methodology detailed in $3] 

First, we estimate a standard Gaussian density, as it has well-known properties, 
namely co-located mean/mode/median and symmetry. In Fig. |4] we show the DGHM 
multiwavelet outperforms the Daubechies wavelet of order 2 for resolution levels —2 < 
j < 0, but at j = 0, the Daubechies wavelet begins to produce a comparable estimate 
of the density, which continues to improve at finer resolutions. Given the highly asym- 
metrical properties of this lower-order Daubechies wavelet, the MWDE with a basis 
family such as DGHM was, as expected, able to outperform the WDE at some coarser 
resolution levels. 

It is well-known that the Daubechies wavelet of order 2 is not necessarily well- 
suited for estimating smooth densities, such as the Gaussian distribution; this is evi- 
dent from the jagged appearance of the wavelet. Similarly, the DGHM multiwavelet, 
though symmetrical, is also somewhat jagged in appearance. A more natural choice 
for estimating a smooth and symmetric density would be a higher-order wavelet, such 
as the Daubechies wavelet of order 5, and a smoother multiwavelet, such at the STT 
multiwavelet. With this, we compare WDE and MWDE using the more suitable bases 
just mentioned. The results are shown in Fig. [5] As anticipated, both the MWDE 
and the WDE are superior to the ones in Fig. |4] Even when using the Daubechies 
wavelet of order 5, we see that MWDE outperforms WDE at the coarse resolution lev- 
els —2 < j < 0. At finer resolution levels j > 0, we see, as in Fig. [4] the standard 
wavelet basis was able to produce a good density approximation. Being inherently 
symmetric, the multiwavelets are able to better reconstruct the symmetric peaks of 
these distributions, even at coarse resolutions. The standard wavelet resolutions must 
be increased to comparably model the symmetries. 

4.2. Balanced Multiwavelets 



From the work of Lebrun and Vetterli [24] and Lebrun and Vetterli [25], we know 



any wavelet can be used to construct a related balanced multiwavelet of arbitrarily high 
multiplicity. For instance, a Daubechies wavelet can be balanced to a Daubechies mul- 
tiwavelet of multiplicity r, with r multiscaling and r multiwavelet functions. Balanced 
multiwavelets are of particular interest to us because they provide a somewhat-less 
subjective method of comparing WDE to MWDE. That is, we can compare MWDE 
using a balanced Daubechies multiwavelet to WDE using the Daubechies wavelet used 
to produce the balanced multiwavelet. The multiscaling and multiwavelet functions of 
balanced Daubechies multiwavelets turn out to be compressed and translated versions 
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(e)7 = 2 (f)y = 3 

Figure 4: MWDE of a symmetrical density (Gaussian with mean and standard devia- 
tion 1) with 10000 samples across resolutions —2 < j < 3 with the symmetric DGHM 
multiwavelet compared with WDE using the asymmetric Daubechies wavelet of order 
2. 
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(e)i = 2 (f)7 = 3 

Figure 5: MWDE of a symmetrical density (Gaussian with mean and standard 
deviation 1) with 10000 samples across resolutions —2 < j < 3 with the sym- 
metric/antisymmetric STT multiwavelet compared with WDE using the asymmetric 
Daubechies wavelet of order 2. 
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of the corresponding scaling and wavelet functions of the Daubechies wavelet Il24lk 
this is evident from viewing Fig. [3] 

MWDE with balanced multiwavelets possesses some interesting properties, espe- 
cially for balanced Daubechies multiwavelets. Because balanced Daubechies multi- 
wavelets are simply compressed and translated versions of their wavelet counterparts, 
we expect MWDE and WDE with these bases to be closely comparable. In fact, we 
find exactly this. We empirically observe a very interesting — if not expected — property 
where, if MWDE with balanced multiwavelets of multiplicity r = 2 produces a cer- 
tain reconstruction at some resolution level j, then the corresponding WDE "lags" the 
MWDE, and produces the same, or very close to the same, reconstruction at resolution 
7 + 1. This is clearly illustrated in Fig. [6] where we estimate a bimodal distribution 
with 10000 samples using the Daubechies wavelet of order 5 for WDE and the balanced 
Daubechies multiwavelet of order 5 for MWDE. 

4.3. Other Multiwavelet Families 

There are many families of multiwavelets available, and all the orthogonal families 
can be utilized for density estimation using the methods presented in this paper. To 
show the utility of MWDE, we estimated a broad range of densities from Matron and 



Wand 112611 and Wand and Jones [44] using a variety of multiwavelet bases. Along with 
the MWDE, we show reconstructions using standard wavelet bases on the same plot; 
the WDE are presented here purely as benchmarks, not for pitting wavelets against 
multiwavelets. The results of these experiments are showcased in Fig. [7] It is worth 
noting that the WDE and MWDE are performed with the same resolution level in each 
density estimation plot. As we have seen in the previous two sections, the most evident 
difference in MWDE and WDE occurs when comparing across resolution levels. That 
is, MWDE tends to perform better than WDE at coarser resolutions. 

We also conducted experiments using the cross product of all the multiwavelets 
available in Keinert lEltl and the wavelets in Peter i27ll (these wavelets and multi- 
wavelets are listed in Tab. [TJ across resolutions —2 < j < 3 and for a variety of density 
functions from Matron and Wand Il26ll and Wand and Jones H44I1 . The parameters of 
the best (measured with ISE) MWDE and WDE reconstructions of each density are 
shown in Tab. [2] The STT multiwavelet was the most successful multiwavelet for 
density estimation, as may be expected from its smooth appearance. In addition to the 
wavelet/multiwavelet families, resolution level, and ISE for each density, we also list 
the number of coefficients needed in the expansion to achieve the given result. Note 
here that no MRA (thus no thresholding) has been implemented, so the coefficient 
counts presented are simply the numbers of coefficients required by the basis to span 
the domain of the sample. 



5. Discussion 

We are motivated to investigate MWDE, for we can use symmetric and symmet- 
ric/antisymmetric bases, such as the STT and DGHM multiwavelets, instead of being 
limited to the necessarily asymmetric orthogonal wavelets. So, we began our investi- 
gation by comparing the results of using the DGHM multiwavelet for MWDE and the 
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(e)7 = 2 (f)y = 3 

Figure 6: WDE using Daubechies wavelet of order 5 and MWDE using balanced 
Daubechies multiwavelet of order 5 compared on a skewed bimodal distribution with 
10000 samples at several resolution levels. Notice how the MWDE reconstruction at 
resolution j = —2 is very closely comparable to the WDE reconstruction at j — — 1 . 
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(a) Balanced Daubechies 6, Daubechies 6, j = 3 



— Truth 
---Wavelet 
3 5 |-- -Multiwavelet 



J , V 



(b) DGHM, Coiflet 5, j = 4 





(c) Multisymlet 10, Symlet 4, y = 



(d) Multisymlet 7, Daubechies 4, j = 2 



-Truth 
-Wavelet 
■-Multiwavelet 





(e) DGHM, DMey, j = 3 



(f) STT, Coiflet 5, j = 2 



Figure 7: Various densities, each with 10000 samples, approximated with a variety of 
multiwavelets and wavelets to showcase the utility of MWDE alongside the familiar 
WDE presented purely as a benchmark. The multiwavelet family, wavelet family, and 
resolution level j are provided, respectively, in each sub-figure. 
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Wavelet Families 



Multiwavelet Families 



Daubechies 2-10 
Symlets 4-10 
Coiflets 1-5 
Discrete Meyer 



Balanced Daubechies 2-10 
Multisymlets 4-10 
Chui-Lian 2-3 
Bat 
DGHM 
STT 



Table 1 : Wavelet and multiwavelet families used in the density estimations in Tab. |2] 







MWDE 






WDE 




Density 


ISE 


Res. 


Coeff. 


ISE 


Res. 


Coeff. 




(xlO- 3 ) 


j 


# 


(xlO" 3 ) 


j 


# 


Normal 


0.576 





24 


0.194 





38 


Bimodal 


0.230 





24 


0.124 


1 


44 


Skewed Bimodal 


0.183 


1 


40 


0.0997 


1 


46 


Claw 


1.25 


3 


128 


0.659 


3 


90 


Double Claw 


1.67 


2 


64 


1.33 


3 


85 



Table 2: Results of estimating a variety of densities with both MWDE and WDE with 
several families of wavelets across resolutions —2 < j < 3. Information about the 
density estimation with the lowest ISE is shown for both wavelets and multiwavelets. 
The wavelet and multiwavelet families used are listed in Tab. Q] 
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Daubechies wavelet of order 2 for WDE for estimating a symmetrical Gaussian distri- 
bution. As evident from Fig. |4] the DGHM multiwavelet outperforms the Daubechies 
wavelet at coarser resolution levels. Following this, we used the more suitable STT 
multiwavelet and Daubechies wavelet of order 5 to estimate the same density with 
much better results. The STT multiwavelet is much smoother than the DGHM multi- 
wavelet, as is evident from comparing Figs. [TJand|2] We see in Fig. [5] that both the 
MWDE and WDE perform well, particularly at resolutions j = and j = 1, respec- 
tively. The primary point of this empirical result is that the MWDE performed its best 
at a coarser resolution level than the wavelet. We will find this is a recurring theme 
throughout the tests we performed. This is an interesting relationship, one which we 
will continue to explore and develop more formally in later works. 

We also investigated a specific and interesting class of multi wavelets: the so-called 
"balanced multiwavelets." We tested MWDE using balanced multiwavelet bases and 
display the results in Fig. [6] The objective of this exercise was to examine the rela- 
tionship across resolution levels between MWDE and WDE using related bases. As 
shown by Lebrun and Vetterli [12411 . we can balance a standard Daubechies wavelet to 
a balanced Daubechies multiwavelet. We did this using the Daubechies wavelet of 
order 5 balanced to a multiwavelet of multiplicity 2. We then compared across reso- 
lutions the density estimations resulting from using the Daubechies wavelet of order 5 
for WDE and the balanced Daubechies multiwavelet of order 5 for MWDE. We found 
that MWDE and WDE result in very similar density estimates, but the wavelet "lags" 
the multiwavelet. That is, the MWDE at resolution level j is very close to the WDE at 
resolution j + 1 . This is expected in the case of Daubechies wavelets because the bal- 
anced Daubechies multiwavelets are just compressed and translated on the half-integers 
versions of the the base Daubechies wavelet. We demonstrated this in Fig. [3] 

There are, of course, many wavelet and multiwavelet families available with the 
necessary properties — namely orthogonality — for density estimation using the meth- 
ods presented in this paper. We performed MWDE and WDE using a variety of mul- 
tiwavelet and wavelet families on a broad sample of distributions and across several 
resolution levels. Our results are summarized in Tabs. Q~|and[2] We consistently see the 
STT multiwavelet and Coiflet 5 wavelet perform the best under the ISE. What is per- 
haps more interesting are the numbers of coefficients required in the various expansions 
to produce the density estimations. These are explicitly given in Tab. [2] As we have 
not used any non-linear density estimation, the numbers of coefficients presented are 
simply the numbers of coefficients required for the scaling and multiscaling functions 
to span the sample (which, in most cases, is contained on the unitless interval [—4,4]). 
From the table, we see that WDE, under the ISE, provides the best density estimation 
for the given distributions. However, in most cases, MWDE provides its best results 
at a coarser or equal resolution and, in all but one case, using fewer coefficients than 
the best WDE results. This could prove fruitful in terms of sparse representation if an 
MRA is constructed and non-linear density estimation is performed using multiwavelet 
bases. Finally, in Fig. |7J in order to show the utility of MWDE, we show a multitude 
of density estimations on various densities and using various multiwavelet families. 

In the cases we have investigated, we see that multiwavelets provide their best 
density estimation at resolution levels coarser than or equal to the best wavelet estima- 
tion for any particular density. This is expected, as a multiwavelet density estimator 
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is constructed such that there are multiple basis functions at every translate along the 
domain requiring two (or, generally, r) coefficients for every translate instead of just 
one as is needed by wavelets. In summary, our investigation shows a general trend that 
the MWDE converges to its best estimate at resolution levels coarser than or equal to 
comparable WDE and using a comparable or fewer number of coefficients. 

6. Conclusion 

In this paper, we have presented for the first time the use of multiwavelet bases 
for density estimation. We illustrated how to implement multiwavelet density estima- 
tion (MWDE) by projecting onto orthogonal multiwavelet bases. The utility of the 
approach was demonstrated by estimating several densities and comparisons were con- 
ducted with the well-established wavelet density estimation (WDE) as a benchmark. 
Our results show, in the large, that MWDE converges to its best estimate at resolution 
levels coarser than or equal to comparable WDE. Furthermore, the number of coeffi- 
cients required by the best MWDE was, in all but one case, less than the number of 
coefficients required by the best WDE. We also examined MWDE using balanced mul- 
tiwavelets and made some interesting empirical observations. We showed that WDE 
"lagged" MWDE by one resolution level when a balanced Daubechies multiwavelet of 
multiplicity 2 is used for the MWDE and the corresponding Daubechies wavelet is used 
for the WDE. Furthermore, we showed the STT multiwavelet was the best (measured 
by ISE) basis of the families we tested for estimating a broad range of densities. In fu- 
ture research, we plan to investigate non-linear MWDE through incorporation of vector 
thresholding techniques and direct non-negative density estimation by estimating yfp 
expanded in a multiwavelet basis. 
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